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ABSTRACT 

In plants, microRNAs (miRNAs) regulate their mRNA 
targets by precisely guiding cleavages between 
the 10th and 11th nucleotides in the complemen- 
tary regions. High-throughput sequencing-based 
methods, such as PARE or degradome profiling 
coupled with a computational analysis of the 
sequencing data, have recently been developed for 
identifying miRNA targets on a genome-wide scale. 
The existing algorithms limit the number of mis- 
matches between a miRNA and its targets and 
strictly do not allow a mismatch or G:U Wobble 
pair at the position 10 or 11. However, evidences 
from recent studies suggest that cleavable targets 
with more mismatches exist indicating that a 
relaxed criterion can find additional miRNA targets. 
In order to identify targets including the ones 
with weak complementarities from degradome 
data, we developed a computational method called 
SeqTar that allows more mismatches and crit- 
ically mismatch or G:U pair at the position 10 or 
11. Precisely, two statistics were introduced in 
SeqTar, one to measure the alignment between 
miRNA and its target and the other to quantify 
the abundance of reads at the center of the miRNA 
complementary site. By applying SeqTar to publicly 
available degradome data sets from Arabidopsis 
and rice, we identified a substantial number of 
novel targets for conserved and non-conserved 
miRNAs in addition to the reported ones. 
Furthermore, using RLM 5 -RACE assay, we experi- 
mentally verified 12 of the novel miRNA targets 



(6 each in Arabidopsis and rice), of which some 
have more than 4 mismatches and have mismatches 
or G:U pairs at the position 10 or 11 in the miRNA 
complementary sites. Thus, SeqTar is an effective 
method for identifying miRNA targets in plants 
using degradome data sets. 

INTRODUCTION 

MicroRNAs (miRNAs) are non-coding RNAs that 
regulate the expression of protein-coding genes mainly 
at the post-transcriptional level in plants and animals 
(1). In plants, miRNAs are known to induce cleavages 
of their mRNA targets between the 10th and 1 1th nucleo- 
tides within nearly perfect complementary sites (2,3). This 
nearly perfect complementarity has extensively been used 
to predict miRNA targets in plants (2,4-13). However, 
such sequence complementarity-based methods often 
produce a large number of false positive predictions, 
which makes it costly to experimentally validate, e.g. 
using modified 5'-RACE assay (14). 

With the advance of next-generation sequencing 
technologies, a genome-wide strategy, namely the degra- 
dome or PARE (14,15), has been developed to directly 
profile the mRNA cleavage products induced by small 
regulatory RNAs, shorthanded as sRNAs that include 
miRNAs and small interfering RNAs (siRNAs). In this 
method, the 5'-ends of polyadenylated products of 
sRNA-mediated mRNA decay are sequenced and subse- 
quently aligned to the cDNA sequences to detect mRNA 
cleavage sites and quantify the abundance of cleavage 
products to determine the effects of sRNA-guided gene 
expression regulation. Currently, CleaveLand (16) is the 
only publicly available computational method for iden- 
tifying plant miRNA targets from degradome data 
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(15,17-22). Cleaveland scores sRNA complementary sites 
based on a mismatch-based scoring scheme (4,6), i.e. (i) a 
mismatch in an sRNA complementary sites is given a 
score of 1 and a G:U pair is given a score of 0.5; (ii) a 
mismatch or a G:U pairs in the core region from 2 to 13 nt 
receives a double score (6,15); (hi) neither mismatch nor 
G:U pair at positions 10 and 11 in a complementary site is 
allowed (7). Generally, sRNA complementary sites with 
scores of <4 were used in identifying miRNA targets 
(6,15). In sharp contrast to this restrictive scheme, some 
miRNA complementary sites with scores of >4 can also 
guide the cleavage of their target transcripts. For instance, 
ath-miR390 is able to guide the cleavage at its 3' comple- 
mentary site of TAS3b transcript despite having a score of 
7 (corresponding to 6.5 mismatches) (9,23); ath-miR159a 
can induce the cleavage of AT5G18100 although their 
complementary site has a score of 6.5 (corresponding to 
4.5 mismatches) (14); miR398-guided cleavage of CCS1 is 
detected despite having a score of 6 (corresponding to 5.5 
mismatches) (19); miR167 can lead to the cleavage of 
Os06g03830 despite having a mismatch at position 11 
(19); and ath-miR173 can lead to the cleavage of 
AT1G50055 even the position 10 of their binding site is 
a mismatch (6). These observations suggest that the 
criteria adopted in CleaveLand are too stringent and 
omit many genuine targets, and relaxation of current 
criteria can identify additional novel targets for miRNAs 
from the degradomes. 

In order to fully utilize the large amount of degradome 
data for identifying miRNA targets particularly those with 
more mismatches, we developed a novel method called 
SeqTar (SEQuencing-based sRNA TARget prediction). 
To reduce the false positive predictions when allowing 
more mismatches, two P-values were introduced in the 
method to control the qualities of its predictions. 
Particularly, the number of mismatches in an sRNA com- 
plementary site is assigned a P-value, P,„, based on the 
shuffled sRNA sequences against randomly chosen 
target sequences, and the number of reads accumulated 
at the central region of the sRNA complementary site, 
the 9-1 1th nt from the 5'-end of miRNA, is given 
another P-value, P v , by a Binomial-test. The reads 
mapped to the 9-1 1th nt are named as valid reads. 

On two degradome data sets from Arabidopsis (14) and 
one from rice (19), SeqTar identified 231 and 268 novel 
sRNA:target pairs with less than 3.5 mismatches and 
with at least 5 valid reads, respectively. Among these 
pairs, 103 and 92 sRNA:target pairs have significant 
numbers of valid reads with P v < 10~ 5 in Arabidopsis 
and rice, respectively. Using a modified 5'-RACE 
(see 'Materials and Methods' section), we experimentally 
validated six sRNA targets each for Arabidopsis and 
rice, respectively. Most of these 12 sRNA:target pairs 
have more than 4 mismatches. More importantly, some 
of these verified miRNA:target pairs have mismatches 
or G:U pairs at positions 10 or 11. Furthermore, we 
identified thousands of sRN A: target pairs that showed 
strong accumulations of reads in the central regions 
(P v < 10" 5 ) but had more than three mismatches in both 
Arabidopsis and rice. These results demonstrated that 
SeqTar is an effective method for finding sRNA targets 



from plant degradome. Our analysis also revealed that 
more transcripts are cleaved by sRNA guided RISC in 
both Arabidopsis and rice than previously reported. 



MATERIALS AND METHODS 

Degradome and sequence data sets used 

The two Arabidopsis degradome data sets (GSM280226, 
denoted as WT, and GSM280227, named as xrn4) (14) 
and one rice degradome data set (GSE 17398, called as 
osa) (19) were downloaded from the NCBI GEO 
database. Two other studies (18,20) also generated 
degradome data from rice but both of them produced 
substantially less reads than the data set of Li et al. (19). 
Thus, the rice degradome of Li et al. (19) was chosen for 
analysis. 

The cDNA sequences of Arabidopsis and rice were 
downloaded from the TAIR database (r9, http://www 
.tair.org) and the Rice Genome Annotation Project 
(r6.1, http://rice.plantbiology.msu.edu/), respectively. 
The sequences of TAS3a/b/c of rice were retrieved from 
the NCBI EST database, under the accession numbers 
EU293144, AU100890 and CA765877 (19), respectively. 

The sequences of mature miRNAs were obtained from 
the miRBase (24) (version 16, http://www.mirbase.org/) 
and the unique miRNA sequences were used in the 
analysis. TasiRNAs of Arabidopsis TAS1 to TAS4 were 
collected from the Arabidopsis Small RNA Project 
Database (http://asrp.cgrb.oregonstate.edu). Some Arabi- 
dopsis small RNAs derived from PPR genes [reported in 
(15)] were also used in this study. The rice tasiRNAs were 
obtained from (19). All small RNA sequences used were 
provided in Supplementary Table SI 2. 

Sequence alignment 

SeqTar used a modified Smith-Waterman algorithm to 
align an sRNA to a target sequence. Briefly, instead of 
performing alignments with matched nucleotides, e.g. 
A-A and C-C, SeqTar found complementary nucleotides, 
i.e. G-C, A-U and G-U Wobble pairs that had rewards of 
+6, +4 and +2, respectively, in alignment. The affine gap 
penalty, i.e. the penalty increasing linearly with the length 
of gap after the initial gap opening penalty, was used for 
gap opening (—8) and gap extension (—4). The algorithm 
gave a penalty of —3 to a known mismatch and a penalty 
of —1 to a mismatch of unspecified nucleotides (i.e. 'N') 
in mRNAs. 

SeqTar next used shuffled sRNA sequences to evaluate 
predicted sRNA complementary sites, which was a 
standard way to evaluate predicted binding sites of plant 
sRNAs (2,4). One hundred dinucleotide shuffled sRNAs 
were generated for a given sRNA sequence. Each of these 
shuffled sRNAs was used to predict complementary sites 
on one target sequence randomly chosen from the pool of 
all target sequences. Finally, the number of mismatches 
of these 100 sRNA:target pairs were used to evaluate 
the P-values of the mismatches, P„„ of the mismatches 
of sRNA's complementary sites, m, by assuming a 
Student's /-distribution. 
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Reads distributions 

The unique sequences of a degradome data set were 
aligned to the transcript (cDNA) sequences with the 
BLASTN program. Then, the abundance of a matched 
locus was obtained by averaging the number of a unique 
sequence to the number of its perfectly matched loci 
in all transcript sequences. Initially, SeqTar scanned the 
BLASTN results to obtain the normalized abundance in 
each position on a transcript. Then, SeqTar calculated the 
accumulation of reads in the central region of an sRNA 
complementary site, i.e. reads starting at positions 
opposite to 9-1 1 nt region from 5'-end of sRNA. 
Although major cleavages often took place between the 
10th and 11th nt, minor cleavages between 9th and 10th 
or 11th and 12th nt had also been reported (6,11,25). 
Among the reads mapped to different positions on the 
target transcript, some reads could have been generated 
by sRNA-guided cleavage events and were named as valid 
reads, v. Thus, it was assumed that the degradation 
products of a target followed a Binomial distribution, 
where the reads mapped to the central region of an 
sRNA complementary site were treated as preferred 
(positive) samples and other reads as control (negative) 
ones. The probability of valid reads, P v , was calculated 
by Equation 1. 

p v (x) = Qcf(i- q y-\ (i) 

where x = max(n 9 , « 10 , «n), n 9 -n u were the number of 
reads mapped to the positions opposite to the 9-1 1th nt 
of the sRNA, respectively, n was the total number of reads 
that were mapped to the whole target sequence, and q was 
a constant that stands for the probability that a mapped 
read was from any nucleotide of the target sequence. If no 
sRNA was involved in the degradation of a target, there 
was no reason to assume that one position would be more 
likely to break down than other positions. Therefore, each 
position of the target sequence was assumed to have the 
same probability to produce a degradation product by 
assuming a Uniform distribution on the degradation 
products of a transcript. Therefore, q in Equation 1 was 
assigned a value of 1/(1 — (r — 1)), where / was the length of 
the target sequence and r was the length of a degradome 
read, since the last r — 1 position of the target sequence 
could not be detected with the sequencing reads. In 
current implementation of SeqTar, P v <10~ 300 were 
regarded as 0. It was important to note that although 
the valid reads, v, were all the reads mapped to the 
9-1 1th positions, P v was calculated from the largest 
number of reads of these three positions. This was 
because P v was used to evaluate whether the major 
cleavage position was preferred by the sRNA-guided 
RISC complex. 

The computational steps and outputs of SeqTar 

The major steps of SeqTar were shown in Supplementary 
Methods. All computational steps of SeqTar had been 
integrated into a whole script whose major steps including 
SeqTar were implemented with the Java programming 
language. SeqTar had been used in the Linux operating 



system and was available for non-commercial purposes 
upon request. 

SeqTar produced six output files: the first listed the 
sRNA:target pairs; the second showed the alignments 
of sRNA complementary sites; the third provided the 
MatLab scripts for generating the T-plots of target 
mRNAs; the fourth gave the number of reads perfectly 
mapped to target mRNAs; the fifth listed the scores of 
shuffled sRNAs used to evaluate the P m values; and 
the last provided the potential novel sRNA candidates. 
As suggested by German et al. (14), SeqTar predicted 
a potential sRNA if an accumulation of reads was 
found at a specific position, named as a peak, on a 
target but no input sRNAs contributed to this accumula- 
tion. Additional details of outputs were given in the 
Supplementary Methods. The first file consisted of 
33 columns to show the information of a miRNA:target 
pair, such as the number of valid reads, the P-value of 
valid reads P v , the number of mismatches, the P-value 
of mismatches P m and the percentage of valid reads. 
A detailed description of these columns were also given 
in Supplementary Methods. 

Performance evaluation 

To evaluate the performance of SeqTar, we com- 
pared its prediction results with that reported in the litera- 
ture. The verified or predicted Arabidopsis sRNA targets 
(2,4,6,7,9,14,15,26-29) were combined and duplicate pairs 
were removed and a resulting list of 428 sRNA:target 
pairs were obtained for Arabidopsis (Supplementary 
Table SI). A total of 230 of these 428 pairs were validated 
targets of 28 conserved sRNA families and summarized 
in Table 1. Similarly, 458 sRNA:target pairs of rice 
(Supplementary Table S2) were obtained from the 
reported results (18-20,28,30-38). Of these, 123 targets 
of 21 conserved sRNA families were previously validated 
and summarized in Table 1. We also compared the 
SeqTar's results with those of the CleaveLand pipeline 
(16) reported recently in the starBase (39). 

Experimental validation using 5' -RACE assay 

The RLM 5'-RACE assay was performed to experimen- 
tally validate 19 predicted targets listed in Supplementary 
Table SI 3 by using the GeneRacer Kit (Invitrogen). 
Briefly, total RNA from Arabidopsis and rice were 
ligated with a 5' -RNA adapter and a reverse transcription 
was performed using oligodT. The resulting cDNA was 
used as a template for nested PCR. The first PCR was 
performed using GeneRacer 5' primer and a gene-specific 
primer. The second PCR was performed using GeneRacer 
5' nested primer and a gene-specific nested primer. The 
amplified products were gel purified, cloned into pGEM 
T-easy vector and sequenced. Gene-specific primers used 
in this study were listed in Supplementary Table SI 3. 

Transient co-expression of miR172 and novel target 
genes (AT5G16480 and Osl0g08580) in 
Nicotiana benthamiana leaves 

We chose miR172 and two of its putative novel target 
genes, one in Arabidopsis, AT5G16480 and the other in 
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Table 1. The conserved miRNA targets of A. thaliana and 0. sativa 
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The A.t. and O.s. columns list the number of targets of A. thaliana and O. sativa that were reported in literature, respectively. The WT, xrn4 and osa 
columns list the number of targets in the A.t. and O.s. column that are predicted by SeqTar in the three data sets, respectively. The WT New, xm4 
New and osa New columns list the number of targets that belong to the same family and are newly predicted by SeqTar. The numbers in parentheses 
are the number of targets whose miRNA complementary sites are predicted but these miRNA complementary sites have no valid reads. A potential 
target is counted if it is targeted by at least one member of the miRNA family. 



rice, Osl0g08580, and experimentally analyzed their 
transient co-expression in N. benthamiana leaves. 
Arabidopsis MIR172a (the italic font means a sequence 
used in a construct) was amplified using locus-specific 
primers. Similarly, full length of AT5G16480 and partial 
gene product of Osl0g08580 (~600 bp) harboring miR 172 
complementary sites were amplified from Arabidopsis and 
rice, respectively (primer sequences were listed in 
Supplementary Table SI 7). The clones were initially 
cloned into TA-vector and sequenced and confirmed 
that no mutations/errors were introduced during the 
process. Then the genes were inserted into Xbal and 
Kpnl sites of binary vector pBIB under the control of 
super promoter. The constructs harboring Ath-MIR172a, 
AT5G 16480 or Osl0g08580 were transformed into 
A. tumefaciens strain GV3101 and these cell cultures 
were infiltrated into N. benthamiana leaves as described 
by English et al. (40). For co-expression analysis, equal 
amount of Agrobacterium culture containing 
Ath-MIR172a and AT5G16480 or Osl0g08580 were 
mixed before infiltration into N. benthamiana leaves. 



RESULTS 

Summary of the predictions from SeqTar 

We analyzed three degradome data sets, two from 
Arabidopsis (WT and xrn4) and one from rice (osa) 
(see 'Materials and Methods' section) using SeqTar. 
SeqTar predicted a total of 235 695, 240 107 and 667 009 
sRNA:target pairs in the WT, xrn4 and osa data sets, 
respectively (Figure 1). After removing duplicate and 
redundant pairs of different mature miRNAs and alterna- 
tively spliced transcripts, 183 194, 188 109 and 461877 
sRNA:target pairs were obtained from the WT, xrn4 
and osa data sets, respectively (see Supplementary 
Methods for details). In addition to the 428 Arabidopsis 
sRNA:target pairs summarized in Supplementary 
Table SI, Howell et al. (9) reported that ath-miR161-l, 
ath-miR161-2, ath-miR400 and seven tasiRNAs derived 
from athTASl/2 transcripts can regulate a total of 40 
PPR transcripts. We thus did not treat the pairs consisting 
of these 10 sRNAs and these 40 PPR transcripts from 
the non-redundant pairs as novel targets in Figure 1. 
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Figure 1. The numbers of predicted targets, m and v stand for the 
number of mismatches and the number of valid reads, respectively. 
Cat. I and Cat. II are the Category I and Category II sRNA:target 
pairs classified by their P v and />,„-values, respectively, as shown in 
Figure 2. Boxes with thin and thick edges are operations and results, 
respectively. 'Reported' means the number of miRNA:target pairs 
reported in literature, as summarized Supplementary Tables SI and 
S2. The predicted targets in the blue dashed box are used to find com- 
binatorially regulated targets. Cat. I and Cat. II miRNA:target pairs in 
this box are given in the Supplementary Tables S6-S8 and S14-S16 for 
the WT, xrn4 and osa data sets, respectively. 



After removing the reported pairs, there were 1 82 673, 
1 87 582 and 461 505 newly identified pairs in the WT, 
xrn4 and osa data sets, respectively. These pairs were clas- 
sified into Category I (with P m <0A and P v < 10~ 5 ) and 
Category II (with P m < 0. 1 and P r >l0~ 5 ). Many new 
sRNA:target pairs, specifically 3386, 925 and 3101 pairs 
in the WT, xrn4 and osa datasets respectively, belonged 
to Category I (see Figure 2d-f). These numbers were fur- 
ther reduced to 2809, 859 and 3036 (in Supplementary 
Tables S6-S8) after considering a minimum of five valid 
reads as a cutoff. Some pairs in Category I (i.e. 88, 39 and 
92 in WT, xrn4 and osa, respectively) only had 
<3 mismatches. After combining results from the 
WT and xrn4 data sets, we found 103 novel Category 
I sRNA:target pairs with <3 mismatches for 
Arabidopsis. Many newly identified targets (solid 
diamonds in Figure 2d-f) in Category I had >3 
mismatches, but had strong accumulations of valid reads 
as indicated by their P v values. Among these identified 



targets, 4 and 6 with >3 mismatches from Arabidopsis 
and rice, respectively, were validated (red solid diamonds 
in Figure 2d-f; Figures 3 and 4; Tables 2 and 3). 

Predicted targets in Category II with <3 mismatches 
(3700, 3762 and 7148 in the WT, xrn4 and osa data sets, 
respectively) may not express or express at low level in the 
sequenced tissues (Supplementary Tables S14-S16). 
Nevertheless, 81, 67 and 176 sRNA:target pairs from the 
WT, xrn4 and osa data sets, respectively, had at least five 
valid reads. After combining the results from the WT and 
xrn4 datasets, we had 128 novel targets belonging to 
Category II with <3 mismatches and >5 valid reads 
from Arabidopsis. 

Validation of the results from SeqTar 

In order to verify that SeqTar functions as expected, we 
first analyzed its performance on the Arabidopsis and rice 
degradome data sets for identification of reported sRNA 
targets. Of the 428 reported targets of Arabidopsis, SeqTar 
recovered 402 and 405 pairs (a total of 412 when merged) 
from the WT and xrn4 data set (Supplementary Table SI), 
respectively, with a P m threshold of 0.1; the remaining 16 
reported targets could be identified with a relaxed P m 
threshold. Consequently, SeqTar achieved a sensitivity 
of 96.3% (412/428) with a P m threshold of 0.1 in identify- 
ing the reported pairs of Arabidopsis. In rice, SeqTar 
identified 381 out of the 457 reported sRNA:target pairs 
(Supplementary Table S2), achieving a sensitivity of 
83.4% with a P m threshold of 0.1. After relaxing the P m 
threshold, SeqTar could predict 17 additional reported 
pairs in rice. 

We further analyzed Seq Tar's capability in identifying 
of conserved sRNA targets in Table 1. SeqTar successfully 
found most of these targets, 225/230 for the WT and xrn4 
data sets and 122/123 for the osa data set, respectively, as 
shown in the last row of Table 1 . The missing miRN A: 
target pairs included miR-403:ATlG31290, four miR895: 
F-Box pairs in Arabidopsis and miR398:CCSl pair in rice. 
But these pairs were found with a relaxed P m . These 
results indicate that SeqTar is sensitive in identifying 
conserved sRNA targets. 

Comparisons with CleaveLand 

We compared the results of SeqTar with those of 
CleaveLand (16) reported in the starBase (39). The two 
degradome data sets of ref. (14) and four degradome data 
sets of ref. (15) from Arabidopsis were combined and used 
in the starBase. Similarly, in the starBase, rice miRNA 
target prediction were performed by combining 
the degradome data sets in refs (18,20). CleaveLand 
(version 2) (16) was used in the starBase to predict 
miRNA:target pairs with at least one read from these 
combined degradome data sets (39). 

The duplicate miRNA:target pairs from starBase/ 
CleaveLand, due to individual members of a miRNA 
family and alternatively spliced target transcripts, were 
removed to obtain 13 399 and 13 279 unique miRNA: 
target pairs in Arabidopsis and rice, respectively. The 
duplicate pairs from SeqTar prediction were also 
removed; the remaining pairs, collectively named as 
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Figure 2. The P v and P m of sRNA:targets pairs, (a) The sRNA:targets pairs of WT and WT New in Table 1. (b) The sRNA:targets pairs of xrn4 and 
xm4 New in Table 1. (c) The sRNA:target pairs of osa and osa New in Table 1. (d) The new sRNA:target pairs in the WT data set that are not 
shown in (a), (e) The new sRNA:target pairs in the xrn4 data set that are not shown in (b). (f) The new sRNA:targets in the osa data set that are not 
shown in (c). Circles stand for reported sRNA:target pairs, black diamonds stand for newly identified sRNA:target pairs, and red diamonds stand for 
newly identified sRNA:target pairs that had been verified with the RLM 5'-RACE experiments, respectively. Green circles and green diamonds stand 
for reported siRNA:target and new siRNA:target pairs, respectively. I, II, III and IV are the four Categories of sRNA:target pairs classified by their 
P y and P m values. 



SeqTar-All, were then compared with CleaveLand's 
results. Here, SeqTar's results on the WT and xrn4 data 
sets were combined to form its results for Arabidopsis. 
In order to compare the ability of SeqTar for finding 
miRNA:target pairs with valid reads, we also compared 
CleaveLand's results to the pairs with at least one valid 
read predicted by SeqTar, named as SeqTar-VR. Then, 
the results of CleaveLand and SeqTar were further 
checked against the reported pairs summarized in 
Supplementary Tables SI and S2 to compare their per- 
formances on detecting the known targets. 

SeqTar has a better performance in identifying 
the reported pairs than CleaveLand. On Arabidopsis, 
SeqTar identified 50 more reported miRNA:target pairs 
with valid reads than CleaveLand even though four 
more degradome data sets were used in ref. (15) (Table 
4). On rice, similarly, SeqTar outperformed CleaveLand 
by identifying 28 additional reported miRNA:target pairs 
with valid reads (Table 4). When taking the pairs without 
valid reads into account, SeqTar had a significantly better 
performance than CleaveLand by identifying about 43% 
and 42% more reported pairs in Arabidopsis and rice, re- 
spectively (Table 4). 



The numbers of common predictions from SeqTar-All, 
SeqTar-VR, starBase/CleaveLand, and reported pairs 
were summarized in Table 4. In both Arabidopsis and 
rice, ~54% of CleaveLand's pairs were overlapped with 
SeqTar-All. The rest pairs of CleaveLand that were not 
found in SeqTar-All had an average score of 6.7 in both 
species. We thus speculated that the P m threshold of 0.1 
of SeqTar might be too stringent to identify these pairs. 
After relaxing P m to 0.2, SeqTar identified more pairs 
overlapped with CleaveLand's results: 2004 new pairs in 
Arabidopsis and 2585 new pairs in rice in addition to those 
in Table 4. 

Conserved miRNAs target additional members of known 
target gene families 

SeqTar's results were analyzed to find whether the 
conserved miRNAs targeted additional members of the 
same gene families. Thirty, twenty-eight and twenty-six 
new targets for the conserved miRNA families had valid 
reads in the three data sets respectively (see the WT New, 
xrn4 New and osa New columns of Table 1), suggesting 
that additional members of these target gene families were 
also cleaved. These newly found targets generally had 
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Figure 3. The experimentally verified novel miRNA targets of Arabidopsis. (a) ath-miR172ab:ATlG24793. (b) ath-miR396b:ATlG53910. 
(c) ath-miR779-2:AT5G 17240. (d) ath-miR172ab:AT5G16480. (e) ath-miR398a:AT3G27200. (f) The conservation of ath-miR398a site on 
AT3G27200. Abbreviated names, Aly, Zma, Bol, Nta, Rra and Sbi stand for A. lyrata PID:484503, Zea mays DQ245243, Brassica oleracea 
DK501936, N. tabacum FS399926, Raphanus raphanistrum subsp. marhimus FD965811, and Sorghum bicolor Sb05g007160, respectively. In Part 
(a) to (e), the x-axis is the position on the transcript, and y-axis is the number of reads detected from a position. The arrows in the upper parts 
correspond to the positions pointed by the arrows of the same colors in the lower parts. The numbers above the arrows are the number of reads 
detected at those positions on the WT data set. The numbers in the parenthesis are the cleavage frequencies determined by the RLM 5'-RACE 
experiments. 



more mismatches in their complementary sites (>4) 
than those reported, which could explain why these 
targets could not be identified in previous studies 
(2,4,6,7,9,14,15,26-29). Details of these newly found 
targets, along with the previously reported, were listed in 
Supplementary Tables S3-S5. 

We also examined the P-values of the complementary 
sites and valid reads of these conserved sRNA targets 
(Figures 2a-c). Most conserved targets have very small 
P v values (<10~ 5 ) and almost all conserved targets have 
P m values <0.1. The only exception was the 3' targeting 
sites of miR390 on TAS3b(AT5G49615) with 6.5 
mismatches (9,23). A proper threshold of P v needs to be 
established in order to remove those targets that only 
had a few valid reads, which might be random degrad- 
ation products. Because the P v values of most conserved 



sRNA targets with valid reads (106/120, 107/120 and 
73/89 for the WT, xrn4 and osa data sets, respectively) 
were <10 -5 (Supplementary Tables S3 to S5, respect- 
ively), we used a P r value of 10~ 5 to identify reliable 
sRNA:target pairs, as indicated by the blue lines in 
Figure 2. 

Based on the criteria of P m = 0.1 and P v = 10~ 5 , all 
predicted targets could be grouped into four categories: 
Category I with P,„<0.1 and P v < 10" 5 , Category II 
with .P m <0.1 and P m > 10" 5 , Category III with P m >0.1 
and P„,>10" 5 , and Category IV with P,»>0.1 
and P,„<10~ 5 (Figure 2). The miRNA:target pairs in 
Category I were the most reliable among all four 
categories because this category had both satisfactory 
complementary sites and enriched valid reads. The pairs 
in Category II, such as ath-miR163:SAMT in the WT data 



e28 Nucleic Acids Research, 2012, Vol. 40, No. 4 



Page 8 of 18 



(a) 

400 

350 
300 
■g 250 

CD 
<D 

■s 200 
I 150 
100 
50 



Os06g01304 



1000 T500 
position in cDNA (bp) 



Os07g36170 



391 (9/10) 

1 



5' aGUAAUA-AUAACAGA- -CCGGUU 3' 

: I I I I I II I I I I I I I I I I I 
3' aUAUUAUAUAAUGUCUACGGCCAA 5' 



Os06g01304 



osa-miR1319 




1500 2000 2500 
position in cDNA (bp) 



79 (3/9)274 (4/9) 

I 

-GAOC 



AGUCAUAUU-GAUCGGCUAAU 3' Os07g36170 
III I I I I I I I I I I I I I : 

UCACUAUAACCAAGCCGAGUG 5' osa-miR171h 



Os02g27400 




loo 1000" f550~" 

position in cDNA (bp) 



171 (7/9) 

i 



5' AUAUGUAUUUUGAA-CCAUGU 3' Os02g27400 

I : I I : I I I : I I I I I I I I : I 
3' UGGACGUAAGACUUAGGUAUA 5' osa-ItliR1852 



(d) 

250 
200 

w 

1 150 
o 

I 100 
50 



Os05g34720 




position in cDNA (bp) 



245 (7/10) 



GUUGUAUCUGUC-CUGUACAU 3' 
I I I I : I I I I I : I HUH I 



Os05g34720 



3' CAACGOAGACGGAGACGUGGA 5' osa-miR530-3p 




Wf' TO00" 1 

position in cDNA (bp) 



310 (2/10) 




1000 1500 2000 2500 
position in cDNA (bp) 



16 62 (3/6) 



5' UUGCUAGCAUUGUUAAGACUUU 3' Osl0g08580 5' ACUCCCUCCGUCCCAGAAUAftA cc 3' Os07g22930 



III I I I I I : : I : I I I I |:| 
uACG-UCGUAGUAGUUCUAAGA 5' osa-miR172d 



I I I I I I I I I I I I I I I I III 
3' UGAGGGAGACAGGAUCUOUOOUuu 5' osa-miR1867 



81 



5' GGCAGCAAGGAtVGGAUUCCUG 3' Osl0g08580 

: I I I I I I I I I I I II MM: 
3' UCGUCGUUCCUAACUUAGGAU 5' osa-miR1425 



16 

I 



5' ACnCCCUCCGHCOCAGAAUaa 3' Os07g22930 

II I I I I I I I I I II II II I 
3' UGAGGGAGGCAGGGUAUUAca 5' osa-miR1436 



Figure 4. The experimentally verified novel miRNA targets of rice Oryza sativa. (a) osa-miR1319:Os06g01304. (b) osa-miR1711i:Os07g36170. 
(c) osa-miR1852:Os02g27400. (d) osa-miR530-3p:Os05g34720. (e) osa-miR172d:Osl0g08580 and osa-miR1425:Osl0g08580. (f) osa-miR1867: 
Os07g22930 and osa-miR1436:Os07g22930. For details refer to the legend of Figure 3. The T-plots and numbers of reads are the results on the 
osa data set. In part (f), the underlined nucleotides indicate the overlapped regions of different miRNA binding sites. 



set, might also be genuine targets but with no or limited 
valid reads, which resulted in insignificant P v values. Only 
one reported pair (miR390:AtTAS3b) belonged to 
Category III (Figure 2a) and IV (Figure 2b) in the WT 
and xm4 data sets, respectively. 

We identified additional targets in Category I 
(Figures 2a-c and Supplementary Tables S3-S5). These 
targets included seven MYB family members (targeted 
by miR858, also see Table 2), two PPR members 
(targeted by miR400) in Arabidopsis (after combining 
results of the WT and xrn4 data sets), and an F-Box 
member (Os05g37690, targeted by miR393) in rice. 
These newly found targets had more than three 
mismatches when aligned with the respective miRNAs. 
Some other MYB family transcription factors were 



reported to be targets of miR828 (41) and miR858 in 
Arabidopsis (14,15), respectively. Our results suggest that 
more MYB family members are targets of these two 
miRNA families (Table 2). 

Novel targets of conserved miRNAs and experimental 
validations 

It is known that conserved miRNAs target members of the 
same gene families (as summarized in Table 1). To identify 
additional targets for conserved miRNAs and to determine 
whether non-conserved miRNAs were functional, we chose 
the top two targets that has the largest number of reads 
at their complementary sites (with the smallest P v values) 
for each sRNA in Arabidopsis and rice, respectively. 
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Table 2. Some newly found sRNA targets of A. thaliana that belong to Category I 
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AT3On?940 


5 




2.9E-12 


1 3.0 


A 1 A/T VR 107* trnncpn nti nn Fn pt nr 

jTVLIVI miu/, LL dll^Cl ipuuii IdC IU1 


nlli-miR87Q-7 

d Lll 1111IVOZ 7 Z 


AT401 31 70 

/ \ 1 '-t \j 1 j 1 a. \j 


3.5 




2.3E-12 


6.8 


TriincnAcn nip plpmptit ctpiip 

L 1 dllopUSd U1C C1C111C11L liCllC 


nlli-miR811 

dLllllll IV O J 1 


AT30?7?9n 

/ \ 1 JVJ Z / Z7W 


4.5 


$ 


6. 1 E- 19 


19.5 


F _ K ri y tninilv/ nrntpin-fplntpH 

1 UUA ldllllly pi U LClll 1 Cld LCLl 


ath miUSII 1r* 


AT1 dl 1 1 60 
r\ 1 1 VJ / 11 OU 


j 


0 


1 SF 1 1 

1 . JE- 1 J 


1 7 1 
1 / . 1 


VC^il- 1 T*Tptr>np\/l Cna S\/nt1incp 7 

jvv,l>/, j-iveLOdcyi-v.od oyiiLiidse / 


ath miR814 
dLll-lllllVO Jt 


AT1 G770QS 
r\ 1 1 VJ / / IJ7J 


5 


0 


4 1E-13 


16 2 


1 1 dllspUbd U1C CIClllCll L ^CllC 


nth mi'R 814 
dLll-lllliVO jt 


ATSr.l 1680 
t\ 1 Jvj 1 JOoU 


4 s 


76 
_0 


1 OF In 
1 .Ue-jj 


0 8 

7.0 


ABOl; ABA-Overly Sensitive 1, transcription 
elongation regulator 


ath miRR^S 

d LIl lllllvo J J- jp 


AT1 G714Q0 
r\ l IvJ / L'-tyKJ 


3 5 




1 7E-15 


19 4 


PPR v\r t i ti 
r r iv pi u LClll 


a Hi -miR 847 

dLll llll IV / 


AT1 G01 7S0 

/ \ 1 1 VJv 1 1 JU 


4.5 


7 


3. 1 Li- 14 


21.2 


A FlF 1 1 ( A pt 1 n r^pt**r\l vmpn 7111 cr Fn ptr\r 1 1 ^ 

, \ l l 1 1 ^rAA_llll IVCpLfiyillCl IZlllg 1 dCLOL 1 1 ^ 


ath mi'RS^O 
dLll-lIlllvo JU 


ati oio^no 

i\ 1 IvJJUJUU 


j 


1 4 
1 -f 


6 7F 70 

0. / c-zu 


1 S 6 

1 J.O 


NF-YA7^ transcriptional repressor (factor) 


nlli-iniR8 SO 

dLll 1111 IVO JU 


/ \ 1 j vj j \j j y vj 


5 




2.3E-14 


22.2 


Trn n cH 1 ipi n /AX/Fl_40 rptipnt tVimilv t*»rr\tpin 

1 1 dllSLl LIClll/ VV 1J 4U 1 CpCd L ldllllly piOLClll 


nth mi'R8S4n A 
dLn-llllrvo J'H-d-Q 


AT1 r;01 4Q0 
f\ 1 1 vjU l^yyj 


1 s 

J . J 


J 1 


1 4F 64 




Hedvy-inetal-associated domain-containing protein 


ath mi'R 8^8 
dLll-lIlllvo Jo 


ATir^6761 0 
i\ 1 JvJOZO 1 u 


1 s 

J . J 


1 1 
1 1 


n" 7F 1 1 
j.Ze- 1 1 




AXMYB11; transcription factor 


nth mi'R8S8 
dLll-llllivo Jo 


ATSG608Q0 
t\ 1 jvjuuo7U 


3 5 


1 0 
1 u 


5 6E-13 


119 


\A 14' f n Cl^VI Ati All TQr*t AT* 

1V1 I DjH, Ll dllsCl lpilUll IdCLUl 


nth mi'R 860 
d Lll-llllivouU 


ATSG76010 

r\ l JvJZOUJU 


0 s 

U.J 


~j 


2 7E-06 


3 8 


JE V. 1 ^1C1 1 UCIlCld LdsC If, 1C1 1 UCllCld LdaC 


nth mi'R870 
dLn-llllrvo / U 


AT1 0061 Q0 
/All vjUO 1 y\J 


5 
j 


1 0 
1 u 


7 OF 1 1 

Z.UE- 1 J 


1 9 
j . _ 


1 'Ty hinrlinri / A r T^TJ'i c£» 

1 r DlllUlIlg j r\ 1 JEdSe 


ath-miR 1 887 

£L L 1 1 1111 IV 10 0/ 


ATI GS7877 


2.5 


16 


9.2E-13 


3.9 


T Tnknnwn rwpitpin 

w 11A11U Wll LJlVJl^lll 


ath miR7Q14 
d Lll-llll IvZ:/ j'-x 


AT1G1 161 0 
r\ 1 jvj 1 JO 1 u 






3 7E-15 


33 3 


ft v 1 H Atpn 1 1 r ^ t q c ^ /Ofr F^i TT 1 r^v x/ivf^n n cp t n m il\/ i^i*/^tp*m 
WAlLllJi CU.UC LdsC, ZvJ'VJ-E C^ll^ UAy ^ClldaC ldllllly pi U LClll 


nth miR7Q17 
dLll-lilirvz,? j / 


AT1G47670 

r\ l JvJH-ZO / U 






4 8E-15 


12 0 


k i 1 1\ O . v l_ . ) 1 , 1 y 1 N / \ UlllLllllg 


nth mi'R 1414 
d L 11 - 1 1 1 1 IV J 4 J *+ 


AT1 r;74470 
f\ 1 1 vJ / M-M-ZU 


-f 


J 


1 1SF 10 

1 . J JE- 1 U 


1 0 7 
1U.Z 


FUX3 {fucosyltransferase 3) 


nth mi'R 1414 
d Lll-lllllVJH- JH 


AT1 G67Q70 
/ \ 1 1 vJU I y 1 \) 


4 


5 


1 SSF 10 

1 . J JC- 1 u 


119 


AT" FFS1FA8 - T1\IA hinrlinfT / troiicrri ntiAii fnr-trM- 
S\ 1 ■njrn.o, JL/lNrt. UlllU.111^ / Ll dllsCl lpilUll IdL LU1 


ath-miR 1414* 

dLll llll IV 


AT1 G^4^S < i 

FY 1 1\J JT-JJJ 


3.5 


7 


6.49E-15 


4.7 


Fpirlr npn n _u ccp\pi n tpH H atii n in ^aii t n 1 111 n cr m*A tpin 
1 *J1 IvllCdLl daavJCld LCU UUllldlll CUllLdllllllli pi *J LClll 


nth miR 1440k 1t-» 
dLll-lllllv J'-tM-UD- Jp 


AT1 r;n48io 

f\ 1 IvjUM-oJU 


j 


7Q 


1 ^7p "3-i 

1 . j / n- j j 


4 1 

-r. 1 


RabGAP/XBC doniam-con taming protein 


nth mi'R 1017nh 
dLiiiinivjyJZdU 


AT1 G76710 

r\ l IvJZO / JU 




1 3 


3 79E-20 


1 0 Q 


JELy\.0 ldllllly pitj LClll 


nth miR 1017nh 
dLIllllliVJyJZdD 


AT7G10670 

r\ l ZVJ JUOZU 


4 


8 1 


1 SSF 1 S7 

1 . J JC- 1 JZ 


12 4 


T-Tictr»np T4 1 7 
EElbLUIlC El 1 .Z 


nth miR 1Q11 
dLll-llllrvJV J J 


AT1 O77110 
i\ 1 1 vJ / / J JU 


4 

-+ . j 


0 


8 48F 1 8 

O.tOE- 1 O 


7S 0 
/ J.U 


1 -aminocyclopropane- 1 -cdrboxy ldte oxidase 


ath-miR3933 


AT1G08980 


5 


41 


2.69E-57 


4.4 


AMI1 (amidase 1); amidase/ hydrolase 


ath-miR4228 


AT4G37020 


5 


24 


9.18E-44 


14.6 


Unknown protein 


ath-miR4239 


AT1G70830 


4.5 


151 


4.92E-134 


2.9 


MLP28 (MLP-LIKE PROTEIN 28) 


ath-miR4239 


AT1G70250 


4.5 


6 


2.37E-13 


10.2 


Receptor serine/threonine kinase 


TASla D4(+) 


AT3G06940 


3 


6 


2.6E-11 


4.4 


Transposable element gene 


TASla D9(-) 


AT4G14510 


3.5 


8 


2.9E-14 


3.4 


RNA binding 


TASlc D6(-) 


AT2G39681 


2 


174 


3.9E-229 


5.4 


TAS2; other RNA 


TAS2 D9(-) 


AT2G39681 


0 


261 


4.36E-319 


8.5 


TAS2; other RNA 


TAS3c D4(+) 


AT2G 19260 


4.5 


6 


4.6E-13 


9.1 


ELM2 domain-containing protein; PHD finger 


ATlG62910-tasi4 


AT4G16570 


2.5 


8 


1.6E-13 


6.3 


PRMT7; protein Arginine methyltransferase 7 



The Columns, M, VR, P v and Percentage, mean the mismatches in the sRNA complementary sites, the number of valid reads, the P-value of valid 
reads, and the percentage of valid reads. In the Target column, PPR protein stands for pentatricopeptide (PPR) repeat-containing protein. 
The sRNA:target pairs that are verified by the 5'-RACE assay are shown in bold face. The VR, P, and Percentage values are calculated from 
either the WT or the xrn4 data set where the larger accumulation of valid reads is found. 
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Table 3. Some newly found sRNA targets of Oryza sativa that belong to Category I 



sRNA 


Locus 


M 


VR 


Py 


(%) 


Target (cDNA) 


11111V 1 J 7L 




5 


50 


5.2E-38 


5.3 


rh n tprminn ti r>n fViPt rtv N^-tprm intil HniTiiim rnntainincr nrntpi n 

111U ILllllliltlllV.'!! lilv IUI . IN L^lllllllCll UwlllCllll ^UllLClllllllg UlVJLUlll 


miR 1 fifth 

11111V 1 U O L_7 


Os01 ffOWOO 


4.5 


35 


5.6E-14 


5.5 


(~*c\rf In Qlptnp4 T-17 A ' 1-T7 R ' V-\ '1-T4 rl nm am ppiti tainino nrntpin 

V^v71 V, llljLV'll^T^ L L^r\.j 1 1_- U /11J/11^ v4Ulllcllll V^L/llLtLllllllg U1L7LV,111 


miRI 71h 

nil t\ i / in 




4 s 


J J — 


0 op+rtn 


78 8 


Chitin-inducible gibberellm- responsive protein 


miR 1 71 i 

1 ill IV 1/11 


Os01 (J777S0 

wau i g / zz ju 


5 


28 


2.2E-35 


8.2 


T Triri 1 tip S_mpii"ir^r^n ncnli a tp cuntnMCP 

LJ 1 1L1111G J - llltJll*jpil*J S UlldLC s V 11 LllCl&C 


miR171i 

1 1 J 1 IV 1/11 


VySUJ 1^ .J^ 1 U U 


5 


50 


7.5E-36 


6.2 


PntaQQiniTi rhannpl nrntpm 

1 \J LcIjji LI 111 l^llulllli^l 1 1 1 <. 1 k_ I L 1 


miR 1 77H 

lllllv 1 / ZU 


O«04o-77770 
UsUt^ZZZ / u 




S4 


4 2E-72 


5 4 


lZ,Api CaaCU. pi U LC1I1 


miR172d 

1 1 II IV 1 / «VI 


Os 1 OpOXSSO 


5 


319 


0 0F+00 

yj .\j i_j 1 uu 


1 1.4 


PA P) binHincr Hnm^im r\f P)N^A nhntnlvfisp Hnmain pnntMinino' 

A ; \ 1 7 L_/lll^.lllltl UL7111CL111 \-f L 1 1 / \ LJllL7LUiytXa^ LlVJllltLlll L-L711 L^lllllllg 

protein 


miR ^ 1 Qa 
lllllv J 1 yd. 


Oc03o- 34780 


4 S 




Q QP 1 4 


S 8 
j .0 


Expressed protein 


miR 3QRa 

11111V J 70CL 


Ck0fia47S40 


45 


38 


2.2E-26 


10.1 


pYrirfcccn rM*(~itpi n 

IjAUI CO SCU Ul (J LClll 


lllllV^ 1 J 


suz gzzz ou 


35 


1 8 


4.7E-28 


8.4 


R pt rr\trci n c 11 n c (~i 11 nrntpi n 1 itipI c\ cciTipH 
1VC 11 U Ll elllapusuil pi \J IL 1 1 1 . LllldelaSlllCU 


lllllVTT 1 J 


Os07o4?3S4 


4.5 


14 


1.2E-24 


5 (j 


PP 1? rpnp',i 1 n n in a 1 n pfint'JiiiiinT i*^t"p\tpin 

1 1 IV ICpCelL LltJlllcLlll L.U11LC1111111X pivJLClll 


miR417 


aU 7c J 1 *J\J\J 


4.5 


37 


4.1E-29 


6. 1 


P)i bvH rnfl 'A von nl -4-fpH 1 if*ta qp 

l_^/lll V 1 U 1 1 cl V U 1 1 1 " ICLlLlCLtLjC 


miR419 


Ck04a46QQ0 


5 


14 


5.0E-22 


6.3 


fi ?-7f*'St tin /")..(t1 1 ipr\c\/l ti*a n cfpra cp 
L M"£Ca 1111 L/ !;! LlCLfa V 1 Ll il 1 1 M L. 1 d3L 


miR4^Qa-i 

nil j. v^ __j y cl j 


Os04ff478?0 


4.5 


19 


2.5E-10 


14.4 


pYiirpQQpH nrntpm 


miR44zl.hr 1 
lllllv'-H-H- DC 1 


n«03o-'>3oso 


4 5 


1 7 


3 1E-26 


1 1 4 


CApi CssCU. pi U LClll 


IIllIv't'ttDL-l 


O«07o-374£0 


4 
-f 


4<i 


1 5P 46 


6 Q 

0. 7 


sre hoinology-3 donieiin protein 3 


miR444bp-' ) 

1 111 IV I It L/C 


Os07p35480 


4.5 


26 


4.6E-42 


6.3 


pYiirpQ'ipH nrntpm 


miR 446 
lllllv'-H-u 


O«0Qa^7Snn 

Usu7^Z / JUU 


5 


19 


4 8E-42 


22 1 


i^ , \/tnpVirnmp P4S0 
L y LUUilUJlllC 1 1JU 


miR 446 

lllllVT^U 


Ck09(i300S0 

a u y g j u \j ~> u 


4 


19 


3.6E-34 


27.9 


pYorftcfn rtr^tpi n 
LAni taacu pi (j lciii 


miR S78 
lllllv JZO 


O«06a01 770 
UauU^Ul / ZU 


3 5 


1 7 


1 3E-23 


15 0 


C Api CaaCU pi U LClll 


miR S^O-^n 

1 111 IV. JJU JU 


CK01 ffS7Q70 


5 


1 78 


ft 2F-794 

O .ZL Z7T 


7.7 


Pvi^rpccpH t"\rr~i Ipi n 
LAUl CSaCU pi (J LClll 


miR S^fl-^n 

1111 IV JJU JU 


n«ns»n?4?n 

vyaujguzi^zu 


45 


108 


1.8E-181 


7.4 


pYnrfccpn rtr^tpi ti 
laui taacu pi \j lciii 


uim -j ju "ju 


O«flSo34720 

V/SUJgJ'T / Z.U 


35 


287 


0 0F+00 


25.5 


T^rii n cprint i rwi n 1 rpm ila'tiTr 

1 1 tlllSCl ip LlLJllell 1 CgUlelLUl 


iniD207o p 
IIlllVOU / d-C 


vJaUZgZOOOU 




-> a 

Z J 


6 OP 70 


Q 0 


Exonuclease 


miR 808 

11111VOUO 


Ckl 0(j">6770 

v y r> i Ugi-U / z.u 


2.5 


44 


8.2E-40 


12.9 


P y ah i ipIph ttp 

l^AiJllLlClCt'lSC 


miR80Qa b 
lllllvoU:7d _ ll 


O«07o-7Q1 40 

vJaUZJiZ;/ 1 H-U 


1 5 


1 8 


2 8E-29 


12 1 


/\llK.yi 111, pil LelLl VC, CApi CsaCU. 


miR&OQ'j b 

imivouyd-n 


Oc04a4^66^ 


j 


1 Q 


1 74 


78 8 
Zo.o 


Expressed protein 


miR 81 Oh 1 
lllllvo 1 U L>- 1 


OqI 7a07040 


5 


^3 


3 3E-39 


s 0 


yP CapUllal VC Icllllliy pi U LClll 


miRRI Ra-e 

11111VO 1 uu L 


Osl 7^31860 


4.5 


12 


3.4E-21 


31.6 


T IrPiHp nprmpiKP 
y^j iciuc uciiiicciac 




VJSUOgUl Jul 




4^A 
f JO 




70 7 

zu.z 


jpoLLeu Icdl 1 1 


miR 1473b 

lllllv 1 IZjU 


Oq01 al 0770 

Usu 1^1 :/Z / U 


5 


1 6 


1 1E-39 


SO 0 


CApi CaaCLl pi U LClll 


miR 1478bpH 
lllllv 1 IZo UCU. 


OqI 0a")6600 
Us 1 UgZOUUU 


3 5 


1 ^ 


1 7E-1 1 


12 9 


l3U1LIU1C 1I1U1 ^ellllL py 1 UpilUapilcl Ldac 


miR 147Q-3n 
iiiiiv. i T^z.y j u 


CK01 ^50690 


4 


58 


6.6E-53 


7.6 


W/Pl Hnni'-iiTi f~l-bpl!a rpnptil Hr^mam aaiiHiiiukt nrntpm 

V V 1 .s UUlllCllll, VJ UCLd LCpCelL ULJlllellll L \ ' 1 1 1 1 1 1 1 1 1 1 1 ~L piULClll 


miR 14^6 
lllllv 1 H-jO 


O«01 a01 S70 

Usu 1 J^U 1 JZU 


4 5 


1 6 


1 6E-22 


5 4 


1 1 clllal cl elaC Icllllliy pi U LClll 


m fD1 A'ifi 

mii\i4jo 




3 




1 3P 70 
1 . jjC-ZU 


7 

Z.J 


Stiireh syntheise 


miR 1437 

lllllv 1 H- J 1 


n«07o-3M 40 


5 


30 


1 3E-13 


12 5 


f nrp bictnnp PT7 A /PT7R/Pn/PT4 
i_-uic iiianjiic iriz/\y nzoy llj/ n.4 


miR 1438 

11111V 1 iJO 


Os06a071 00 

woUUgU / 1 \J\J 


5 


10 


1.5E-1 1 


1 1 .5 


RTNr : ;-P[7 finapr nrntpin 

1V11>VJ 11Z llllLlCl piULClll 


miR1439 


Os03gll490 


4.5 


62 


7.9E-88 


20.7 


Expressed protein 


miR 1 1 
lllllv lOJl 


O«08a0^6^0 


5 


74 


5 5E-42 


8 1 


/\cy 1-d.cLi vti Liiig enzyme i h- 


miR 1857 


Oc07o774ftn 


4 


1 88 


n np+oo 


1 8 7 
10. / 


VJbr Dj\.'-ty - r - DUX U.Ullld.111 CUI1 Lclllllllg piu LClll 


miR 1 8^7 1y\ 
lllllvl o j /- Jp 


OcOSit^^71 0 

VJSUJgJJ / 1U 




jj 


1 1 P 60 
1 . 1 jC-DU 


6 4 


WD domain, G-betci repeat domain containing protein 


miR 1 8S7 Sn 
lllllvloJ /-jp 


Oel 1 o0^770 

KJa 1 1 JiU J / ZU 


4 5 


7S 


5 4E-23 


16 0 


P YTM"PCCPn ATAtPin 

ITZXpi CaaCU. pi U LClll 


miR 1 8S8^b 
lllllv 1 o J od U 


O«06a4S : l40 


4 


°8 


3 9E-18 


5 7 


a Cp LlU.yi-pi Uiyi Cla- Ll cilia laUlllCl daC, l IV1>a - type 


miR 1 861 pVm 

lllllv 1 OU 1 CK111 


OqI 0a^781 0 

Us lUgJZu 1U 


5 


1 6 


4 1E-24 


7 1 
/ . 1 


Rpt n 'ji m \/l q cp 
DCLcl-d.lliyid.aC 


miR l££7H 
lllllv 1 OUZU 


O«07o7?tHn 
usu / gizyju 


4 


9 


6 2E-05 


n 8 

U.O 


O Ldl ell ay llLlldaC 


miR 1 877 

lllllv lu/Z 


Os07ff487Q0 


5.5 


99 


1.0E-123 


4.9 


A 1V/TT 1 rMitl^ll\/P PYfM'PCCPA 

rA.ivii_.i, pLLLciLivc, CAui caacva 


miR 70QQ-Sn 

lllllVZ.U:' 7 JU 


r)«03»S51 fi4 

UiUJgJJ 1 U4 


45 


123 


3.2E-81 


10.0 


OsWRiv'Y4 - SnnprfVimilv nf TPq havina WRKY Ar\r\ 7inr 

V 7> VV IViV. It O LipCl Icllllliy Ul 1 1 a lldVlllg VV AVIV. 1 dllLl Z.111C 

fi n cpi* HninanK 
iiiig^ci u. uiiiciiii a 


miR 7 n^n p 
illllvZ 1 ZJO.-1 


O«07a^4QS0 


[ 


S4 


4 6E-83 


9 0 


A T^P nmHmo - i^i"i^fpm I'liitQt^/p pvi^fpccpn 
f \ 1 1 UlllU.111^ pi U LClll, p LLLd L1VC, CXpi CaaCU. 


miR7862 

lllllVZ-OUA 


Os0R(?01710 

V_/^\7tj gU 1 / 1 \J 


4.5 


19 


9.4E-26 


10.7 


TP Hnmain PAnlMimncr nrntpin 

V J 1 . ll CIU111C1111 CUllLCLllllllt; ULULL111 


lllllVi.O UJ U 


Os04?4fi73n 


4.5 


12 


4.9E-17 


5.4 


Tb i np'itpr^i qp fVimilv nrntpin 
i niucaLciciac iciiiiiiy uiulciii 


miR2874 


Osl2g44350 


5 


34 


1.2E-42 


7.8 


Actin 


miR2878-3p 


Os02g40900 


5.5 


180 


2.5E-318 


37.7 


RNA recognition motif containing protein 


miR2878-5p 


Os03g07110 


5.5 


18 


1.3E-46 


30.0 


Calmodulin-binding protein 


miR2878-5p 


Osllgl9100 


5 


87 


1.4E-101 


5.2 


Retrotransposon protein 


miR2925 


Os08g03590 


3.5 


38 


9.2E-54 


10.2 


Expressed protein 


miR2926 


Os07g33660 


4 


43 


3.1E-51 


6.3 


Expressed protein 


miR2926 


Os05g29020 


4 


25 


9.1E-49 


10.5 


Expressed protein 


miR2929 


Os03g 19240 


4.5 


17 


5.1E-24 


4.6 


AMP-binding enzyme, putative, expressed 


miR2930 


Os02g44870 


4.5 


73 


2.7E-34 


2.6 


Dehydrin, putative, expressed 


miR2931 


Osl0g30951 


3.5 


36 


1.5E-35 


1.5 


Expressed protein 



For details refer to the legend of Table 2. 



The obtained pairs were manually inspected based on the 
number of valid reads and the number of mismatches. 
The resulted miRNA:target pairs in Arabidopsis and rice 
were listed in Table 2 and 3, respectively. 



As mentioned in the 'Materials and Methods' section, 
we selected a total of 19 predicted targets, 7 from 
Arabidopsis and 12 from rice, for experimental validation. 
Of these genes, four were not amplified in the tissue tested, 
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Table 4. The comparisons between the CleaveLand Pipeline and the 
SeqTar pipeline 





SeqTar-All 


SeqTar-VR 


starBase/CL 


Reported 


Total 


Arabidopsis 












SeqTar-All 




41020 


7215 


412 


246227 


SeqTar-VR 


41020 




5966 


277 


41020 


starBase/CL 


7215 


5966 




227 


13 399 


Reported 


412 


277 


227 




428 


Rice 












SeqTar-All 




76497 


7375 


382 


487 305 
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which could be due to low abundance below detectable 
level. Of the 15 amplified genes, 12 genes were cleaved 
at the expected sites, as shown in Figures 3, 4 and 
Supplementary Figure S4e. 

Our analyses revealed that conserved miRNAs tar- 
get new gene families that have more mismatches at 
the miRNA complementary sites (Tables 2 and 3). 
For instance, ath-miR398a targets AT3G27200, a 
plastocyanin-like domain-containing protein, with 4.5 
mismatches (Table 2 and Figure 3e). Homologs of this 
gene in many plant species, but not all, possess miR398 
complementary sites (Figure 3f)- These results indicated 
that the miR398 family in some plant species target three 
conserved gene families, in addition to the two reported 
families, CSD and CCS1 (Table 1). Ath-miR172ab targets 
five yV-acetylglucosamine deacetylase family transcripts 
(with 4.5 mismatches, see Supplementary Tables S6 
and S7), and one of them (AT1G24793) is validated 
(Figure 3a); ath-miR172ab targets AT5G 16480 (a 
tyrosine-specific protein phosphatase), which is also 
validated (with five mismatches, see Figure 3d). Similarly, 
osa-miR171h:Os07g36170 (a chitin-inducible gibberellin- 
responsive protein) has 4.5 mismatches and osa- 
miR172d:Osl0g08580 (a FAD binding domain of DNA 
photolyase domain containing protein) has five mis- 
matches (Table 3), and both are validated (Figure 4b 
and e). The miR396 family targets the GRF (Growth- 
Regulating Factor) family (15,18). In our study, we 
found that ath-miR396 can also regulate RAP2.12, a 
member of the ERF/AP2 transcription factor family. The 
miR396b cleavage site on AT1G53910 (RAP2.12) was 
validated using the 5'-RACE assay although there is a 
mismatch at position 11 (Figure 3b and Table 2). These 
examples illustrated that some of the conserved miRNA 
families can target more than one gene families in 
Arabidopsis and rice. 

As shown in Figures 3d and 4e, AT5G 16480 in 
Arabidopsis and Osl0g08580 in rice are miR172 targets. 



To provide further experimental evidence on the accuracy 
of SeqTar, we infiltrated A. tumefaciens harboring the 
ath-miR172a primary transcript and two target genes, 
one from Arabidopsis (AT5G16480) and the other from 
rice (Osl0g08580), into N. benthamiana leaves for transi- 
ent co-expression analysis. The result confirmed the 
expression of miR172 in the mock, miR172, AT5G16480/ 
Osl0g08580 and miR172+AT5G16480/Osl0g08580 
infiltrated leaves. As expected, miR172 accumulation is 
significantly higher in leaves infiltrated with miR172 
and miR172+AT5G16480/Osl0g08580 than in leaves 
infiltrated with mock and AT5G16480/Osl0g08580 
(Figure 5a and b). miR172 is a highly conserved miRNA 
in plants, so that the detection of miR172 in mock and 
AT5G16480/Osl0g08580 infiltrated N. benthamiana leaves 
is not surprising and the detected signal in these cases may 
also be due to endogenous miR172 in N. benthamiana 
(Figure 5a and b). Transcripts of AT5G16480 or 
Osl0g08580 have been detected in tobacco leaves 
infiltrated with the respective constructs. Similarly, these 
transcripts were also detected in leaves infiltrated with 
AT5G16480/Osl0g08580 along with miR172, but not in 
mock and miR172 infiltrated leaves (Figure 5a and b). 
AT5G16480/Osl0g08580 expression levels were very 
high in leaves infiltrated with AT5G16480/Osl0g08580 
alone, but their levels were substantially reduced in the 
leaves when miR172 and AT5G16480/Osl0g08580 were 
co-expressed (Figure 5a and b). These results indicated 
that the targets identified by SeqTar are indeed genuine 
and miR172 can target and cleave the AT5G16480/ 
Osl0g08580 transcripts in Arabidopsis /rice. 

Identification of new targets of non-conserved miRNAs 
and siRNAs 

Many non-conserved miRNAs in Arabidopsis and rice 
were found to have cleavable targets, e.g. ath-miR779-2: 
AT5G 17240 (Figure 3c), ath-miR3932b:AT2G30620, 
ath-miR3933:ATlG08980, and ath-miR4239:ATlG70830 
(Table 2) and osa-miR1319:Os06g01304 (Figure 4a), 
osa-miR1852:Os02g27400 (Figure 4c), osa-miR2878-3p: 
Os02g40900 and osa-miR2878-5p:Osl lgl9100 (Table 3). 
Some of the pairs, such as ath-miR860:AT5G26030 
with 0.5 mismatches (Table 2) and osa-miR2123a- 
c:Os02g34950 with 1 mismatch (Table 3), were highly 
complementary. Unlike the conserved miRNAs targeting 
many transcription factors, a few transcription factors 
were identified as targets of non-conserved sRNAs in 
Arabidopsis and rice. As listed in Table 2, only 
seven targets in Arabidopsis, i.e. ARF3 (AT2G33860, 
targeted by miR400), bZIP7 (AT4G37730, targeted 
by miR413), MYB107 (AT3G02940, targeted by 
miR828), NF-YA7 (AT1G30500, targeted by miR850), 
MYB11 (AT3G62610, targeted by miR858), MYB34 
(AT5G60890, targeted by miR858) and HSFA8 
(AT1G67970, targeted by miR3434), are transcription 
factors. 

In rice, a non-conserved miRNA osa-miR530-3p 
targeted Os05g34720, a transcription factor, which was 
also validated in this study (Figure 4d and Table 3). The 
non-conserved miRNAs, osa-miR1436 and osa-miR1867, 
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Figure 5. The validation of AT5G16480 and Osl0g08580 as targets of miR172 using the transient co-expression assay. N. benthamiana leaves were 
infiltrated with infiltration medium (mock); Agrobacteria harboring Ath-MIR172a alone (miR172); Agrobacteria harboring Arabidopsis transcript 
AT5G16480/nce transcript Osl0g08580 alone (AT5G16480/Osl0g08580); co-expression Ath-MIR172a and target genes (miR172+AT5G 16480/ 
miR172+Osl0g08580). For the co-expression, equal amount of Agrobacterium culture containing Alh-MIRl72a and AT5G16480 or Osl0g08580 
were mixed before infiltration into N. benthamiana leaves. U6 and actin are served as loading controls for miR172 and target gene (AT5G16480 or 
Osl0g08580) detection, respectively, (a) The validation of AT5G16480. (b) The validation of Osl0g08580. 



target OsfJ7g22930, a starch synthase protein (Figure 4f 
and Table 3). osa-miR1439 also has a complementary 
site with 3.5 mismatches on Os07g22930, which has 3 
valid reads (P v = 0.06), at 3nt upstream of osa-miR1436 
complementary site (Figure 4f). Interestingly, our analysis 
suggest that osa-miR1436 and osa-miR1439 can also com- 
binatorially regulate another starch synthase, Os06g06560 
(Supplementary Figure S2). These results suggested that 
osa-miR1436, osa-miR1439 and probably osa-miR1867 
can regulate genes implicated in starch synthesis 
pathways in rice. 

Furthermore, our analysis also suggested that some 
siRNAs derived from both TAS1/2 and PPR transcripts 
might also target other transcripts. For examples, 
TASla_D4(+) can target AT3G06940, a transposable 
element, and ATlG62910-tasi4 (an siRNA derived from 
AT1G62910) can target AT4G 16570, Protein Arginine 
Methyltranferase 7 (Table 2). 

The combinatorial regulations of mRNA targets 

In order to investigate potential combinatorial regulations 
by different miRNA families, we examined the previously 
reported miRNA:targets pairs (Supplementary Tables SI 
and S2) and the pairs in the dashed box of Figure 1 
(Supplementary Tables S6-S8 for Category I pairs, and 
S14-S16 for Category II pairs, respectively). Some of the 
combinatorially regulated targets are shown in Figures 
6 and 7. For instance, AT3G26810 (an F-box family 
protein) was a known target of ath-miR393 (15,28). Our 
analysis suggested that AT3G28160 could also be 
regulated by ath-miR396b (Figure 6b). Zhou et al. (20) 
reported that osa-miR806 guided cleavage on 
Os02g43370 (Table S2). We find that osa-miR2123 can 
also regulate Os02g43370. The complementary sites of 
osa-miR806 and osa-miR2123 on Os2g43370 are partially 
overlapping (Figure 7b). Similarly, osa-miR446 can 
regulate Os02g29140 (19,20) (Supplementary Table S2). 
Our analysis shows that osa-miR809 can target 
Os02g29140 transcript with a partially overlapping 



complementary site (Figure 7h). We also recognize that 
osa-miR809, osa-miR446 and osa-miR808 combinatorial- 
ly regulate several other transcripts, such as Os01gl5520, 
Os06gl9990, Os08g40440, Osl0g26720 and Osl2gl2950 
(Supplementary Table S8), indicating the existence of sev- 
eral common targets of these three miRNAs. 
Furthermore, AT5G38480 was found to be cleaved by 
ATlG62910-tasi4 and ath-miR167 (Figure 6f), suggesting 
a combinatorial regulation resulting from PPR-derived 
siRNA and miRNA. TAS3 derived siRNAs are known 
to target ARF3 (AT2G33860) transcript (6,15,26). 
Additionally, our analysis revealed that ath-miR400 
could also target ARF3 transcript but at a different site 
with 4.5 mismatches (Supplementary Figure SI). These 
results, together with many other examples in the 
current study (Figures 6 and 7 and Supplementary 
Tables S6-S8) suggested that one transcript could be 
targeted by two or more different sRNA in Arabidopsis 
and rice. 

Self- and cross-repression of TAS/PPR transcripts 

Mapping 20 nt reads to the TAS transcripts suggested that 
TASla (AT2G27400), TASlc (AT2G39675) and TAS2 
(AT2G39681) transcripts are subjected to cleavages 
guided by the siRNAs derived from their own precursors 
(Supplementary Figure S4). In addition to ath-miR173 
cleavage sites, all these transcripts are regulated by at 
least one other siRNA, TASlc_D6(— ). The regulation of 
TAS2 by TASlc_D6(-) siRNA was validated using the 
5'-RACE assay (Supplementary Figure S4e). TASlc was 
regulated by two other siRNAs, TASlc_D10(-) and 
TASla_D9(-) (Supplementary Figure S4c and d). TAS2 
was regulated by three siRNAs derived from its own tran- 
script, TAS2_D6(-), TAS2_D9(-) and TAS2_D11(-) 
(Supplementary Figure S4e and f). Similarly, cleavage on 
TAS4 (AT3G25795) was guided by one of the self-derived 
tasiRNA, TAS4_D4(-) (P v < 10~* in the WT data set, see 
Supplementary Table S9). These results suggested that 
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Figure 6. The predicted Arabidopsis targets that are combinatorially regulated, (a) AT5G11260. (b) AT3G26810. The blue binding site of 
ath-miR393ab was a reported site, (c) AT1G17650. (d) AT3G07990. (e) AT2G27530. (f) AT5G38480. For details refer to the legend of Figure 3. 
WT and xrn4 in parenthesis indicate the sample where the T-plots and number of reads were obtained. 



tasiRNAs derived from TAS1, TAS2 and probably TAS4, 
regulate and repress their own transcripts. 

AT1G62910, a PPR transcript, possessed three target 
sites for five different sRNAs (Supplementary Figure S5a 
and b). Among the three sites, one had a major peak and 
the other two had minor peaks. TAS2_D6(— ) could con- 
tribute the major peak and the other two minor peaks could 
be attributed to ATlG62910-tasi3/ath-miR161-l and 
ATlG63400-tasil/ath-miR161-2, where AT1G62910- 
tasi3 and ATlG63400-tasil were miR-161-like siRNA 



derived from PPR transcripts (Figure 8b). Similar regula- 
tions on AT1G62930 and AT1G62860 were also identified 
(Supplementary Figure S5c-f). 

AT1G63080 was targeted by TAS2_D6(-), miR161-l 
and miR161-2, and it has been predicted that miR400, 
TAS2_D9(-) and TAS2_D11(-) can also target 
AT1G63080 (6). Our analysis confirmed that TAS2_ 
Dll(— ) indeed induced a major cleavage site on 
AT1G63080 transcript. TAS2_D6(-) and miR161-l/ 
ATlG62910-tasi3 contribute to another two minor 
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Figure 7. The predicted rice targets that are combinatorially regulated, (a) Os01g44990. (b) Os02g43370. (c) Os03g06960. (d) Os03g55164. 
(e) Os04g44800. (f) Os08g08190. (g) Os05g02420. (h) Os02g29140. (i) Os04g41620. For details refer to the legend of Figure 3. The blue sites were 
published sites, see Supplementary Table S2. In part (b), (d) and (h), the underlined nucleotides indicate the overlapped regions of different miRNA 
binding sites, and the numbers above start and end of the target sequences are the start and end positions of the binding sites, respectively. 
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Figure 8. The self-repression of TAS and PPR transcripts, (a) A schematic view of ath-miR173/TASl,TAS2/PPR sRNA generating cascade. 
The green arrows stand for the sRNA-mediated regulation that are required to generate sRNAs. The two red dull arrows stand for the cleavages 
of transcripts to repress the ever-expanding cascade at the TAS1/2 and PPR level, respectively, (b) The ath-miR161 and ath-miR161-like sRNAs that 
are derived from the PPR transcripts. The underlined nucleotides are identical in all four sRNAs. 



cleavage sites, respectively (see Supplementary Table S10). 
Sixteen other PPR transcripts, i.e. AT1G06580, 
AT1G12775, AT1G19720, AT1G26460, AT1G62590, 
AT1G62860, AT1G62910, AT1G62930, AT1G63080, 
AT1G63130, AT1G63150, AT1G63330, AT1G63400, 
AT5G08510, AT5G16640 and AT5G41170, were found 
to be cleaved by at least two different sRNAs at different 
positions (Supplementary Table S10). As reported in (9), 
ath-miR161-l and ath-miR161-2 can regulate as many as 
40 PPR transcripts. Our results suggested that several 
siRNAs derived from PPR genes, especially the two 
ath-miR161 like siRNAs, ATlG62910-tasi3 and 
ATlG63400-tasil, were involved in self- or cross- 
repression of many PPR transcripts (see Supplementary 
Table S10). Our results also suggested that a pseudogene 
of PPR proteins, AT1G62860, was cleaved by 
TAS2_D12(-), TAS2_D9(-), ath-miR161-l and 
ATlG62910-tasi3 (Supplementary Figure S5e and f). In 
summary, these results suggest that there are complex 
combinatorial self- and cross-repression in the 
ath-miR173/TAS/PPR siRNA regulation cascade. 

Self-repression of miRNAs in Arabidopsis 

German et al. (14) found that ath-miR172 can self-repress 
the primary transcript of ath-miR172b. Four other 
miRNAs, ath-miR390a, ath-miR398b, ath-miR396a and 
ath-miR396b, also have similar self-repression guided by 
their own mature miRNAs (14). We found that four more 
miRNA families, ath-miR163, ath-miR860, ath-miR166f 
and ath-miR393b (Supplementary Figure S3) also 
self-repressed their own precursors (P r < 10~ 3 ), suggesting 
that the self-repression of pre-miRNAs is more prevalent 
in Arabidopsis than previously reported. 

The false discovery rate of SeqTar 

We used the method introduced by Storey and Tibshirani 
(42) to evaluate the False Discovery Rate (FDR) of 
SeqTar's results. We estimated the FDR and ^-values 
of P,„ and P,„ respectively. The g-value is a measure of 
significance in terms of the FDR (42). The FDR and 
(/-values of all new predictions were <0.05 when the 
thresholds of P m and P v were set to 0.1, except for the 
P v of new and Category II predictions of the osa data 
(Supplementary Table Sll). But these measures were 
<0.05 if a slightly more stringent P,,-value, P v <0.07, 



was used. Because P m and P v were calculated independ- 
ently, FDR and ^-values of P m and P r were also supposed 
to be independent. Therefore, it was reasonable to expect 
the FDR and q of a predicted sRNA:target pair were 
<0.0025 (0.05 2 ) when both P,„<0.1 and P,,<0.1 (or 
P r < 0.05 for large number of predictions such as the 
osa data set) were satisfied. This suggested that the 
FDR of newly predicted sRNA:target pairs were much 
<0.01 when both P m <0.1 and P,,<0.1 (or P,,<0.05 for 
a large number of predictions) were satisfied. The 
FDRs of the pairs of Category I were <10~ 4 (in 
Supplementary Table Sll), indicating that the predictions 
of Category I were highly reliable. The FDR and q- values 
of P m of reported pairs were <0.01, which was consist- 
ent with the preference of intensively matched complemen- 
tary sites in the reported pairs. The FDR and ^-values of 
P v of reported pairs were smaller than pairs in Category II 
but larger than pairs in Category I (see Supplementary 
Table Sll). In summary, the FDR values suggested that 
the results of SeqTar were reliable and had a very low 
ratio of false positives if both P m and P v were set to 
0.05, or even P m <0.1 in all cases and P v <0.1 in most 
cases (see Supplementary Table Sll). 

Efficiency of SeqTar 

SeqTar used about 1000 and 2000 CPU seconds of an 
Intel Xeon 2.66 GHz 64 bit CPU to search potential 
targets of one sRNA against all transcripts of 
Arabidopsis and rice, respectively. In addition to a few 
efficient supporting steps (see Supplementary Methods), 
it took a modest number of hours to perform target pre- 
dictions on all annotated transcript cDNA sequences for 
all miRNAs and siRNAs in both of these two species on a 
normal server computer with multiple CPUs. 

DISCUSSION 

SeqTar's improved performance 

In this study, we have demonstrated that SeqTar is a 
more effective and efficient computational method for 
identification of miRNA/siRNA targets from the 
degradome data sets in plants. By relaxing the number of 
mismatches, SeqTar found many new targets for conserved 
and non-conserved miRNAs in Arabidopsis and rice. 
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The improved performance of SeqTar could be attributed 
to three major facts. First, instead of setting a subjective 
criterion such as the number of mismatches in its predic- 
tion, SeqTar used the ^-values of mismatches generated 
with shuffled sRNA sequences. Because different miRNA 
families have varied number of targets and conserved 
miRNAs tend to bind to regions with high complemen- 
tarities in their targets, P m could have a better capability 
in differentiating true complementary sites from false ones. 
It is also better to use /^-values than a specified number of 
mismatches for miRNAs of different lengths because 
longer miRNAs should be able to tolerate a few 
more mismatches than shorter ones. For example, 
24 nt miRNAs such as ath-miR829-l (Figure 6e), 
osa-miR1867 (Figure 4f), osa-miR1874-5p (Figure 7e) 
and osa-miR1862 (Figure 7f) could cleave their targets 
despite having >5 mismatches in the complementary 
sites. Second, SeqTar treated mismatches and G:U pairs 
in different positions of sRNA complementary sites 
equally. In previous studies, mismatches and G:U pairs in 
the 2nt to 13 nt region received more penalties (6,15,16) 
and were not allowed at positions 10 and 11 (7). 
However, our results indicated that some sRNA comple- 
mentary sites with mismatches and G:U pairs at these pos- 
itions are also subjected to sRNA-guided cleavages. Eight 
verified miRNA:target pairs (Figures 3a-d and 4a, b, d and 
e) had at least two mismatches within the regions of 
the 2-1 3th nt. Among these eight pairs, osa- 
miR171h:Os07g36170 and ath-miR396b:ATlG53190 also 
had a mismatch at position 10 and 11, respectively 
(in Figures 3b and 4b). Two published work (6,43) also sup- 
port our findings. Allen et al. (6) verified that ath-miR173 
can cleave AT1G50055 (TASlb) even the positions 10 and 
9 of their complementary site are mismatches; Mallory et 
al. (43) demonstrated that a mutated miR165 complemen- 
tary site with a mismatch at position 10 can be cleaved. 
More importantly, SeqTar took advantage of the abun- 
dance of valid reads, i.e. reads mapped to the 9-1 1 nt 
region, to perform a statistical analysis of sRNA comple- 
mentary sites. In particular, the P v values were calculated to 
evaluate the abundance of valid reads at the predicted 
cleavage sites. By combining the P m and ^-values, 
SeqTar's sensitivity and specificity were enhanced to out- 
perform the methods that only used sequence information 
alone. Our results clearly suggest that the existing criteria of 
predicting targets for sRNA in plants may be too stringent 
to successfully identify genuine targets with weak 
complementarities . 

Finally, as a rule of thumb for using SeqTar, if 
P v <10~ 5 , a P,„ threshold of 0.1 can be used to find 
miRNA:target pairs with a good sensitivity and reason- 
able specificity. If P v > 10~ 5 , it is better to use a stringent 
P m value of <0.05 (or 0.01), or alternatively to restrict the 
number of mismatches m < 4 as a criterion as proposed 
in early studies. For instance, by using P v < 10~ and 
P m <0A, 41.6% and 45.0% reported pairs in 
Supplementary Table SI could be identified on the WT 
and xrn4 data sets, respectively. Then, by using 
P m <0.05 alone, additional 43% pairs in Supplementary 
Table SI were identified on both the WT and xm4 
data sets. Similarly, 132 and 245 out of the 458 reported 



pairs of rice in Supplementary Table S2 could be identified 
on the osa data set by using the same criteria. 

More sRNA targets exist than previously reported 

Even with a very strict criterion of P„<10~ and <3 
mismatches in complementary sites, SeqTar found 
103 and 92 novel sRNA targets in Arabidopsis and rice, 
respectively. Another 128 and 176 novel target sites in 
Arabidopsis and rice, respectively, had <3 mismatches 
and at least five valid reads. If using P,„ < 0.1, instead of 
restricting the number of mismatches m<3, and 
P v < 10~ , >3000 novel miRNA:target pairs could be 
detected in both species (see Category I predictions in 
Figure 1 and Supplementary Tables S6-S8). Our results 
suggest that several newly identified non-conserved 
miRNAs are functional. As shown in Supplementary 
Tables S6-S8 and Figures 6 and 7, as well as 
Supplementary Tables S14-S16, a small percentage of 
targets are combinatorially regulated by more than one 
sRNA in these two species. 

sRNA induced self- and cross-repression 

The tasiRNAs derived from TASla/c and TAS2 may self- 
and/or cross-target their own transcripts (Figure 8a). Two 
ath-miR161 like siRNAs (Figure 8b) are derived from 
AT1G62910, AT1G62930, AT1G63130 and AT1G63400, 
which are close paralogs of the PPR-P clade proteins (9). 
As shown in Supplementary Figures S5a-f, they might 
potentially target their own transcripts and many 
other PPR transcripts (see Supplementary Table S10). 
As reported by Howell et al. (9), ath-miR161 might 
target as many as 40 PPR transcripts, including the 28 
genes in the PPR-P clade. These observations suggested 
that the ath-miR161 like siRNAs derived from these 
closely related PPR paralogs repressed the ever-enlarging 
sRNA generation cascade originated from ath-miR173 at 
the PPR level (Figure 8a). Current model of ath-miR173/ 
TAS/PPR cascade suggests that the ath-miR173 guided 
cleavage leads to the generation of tasiRNAs on TAS1 
and TAS2, and some of these tasiRNAs induce the gen- 
eration of siRNAs from PPR transcripts. But our analysis 
suggested that some tasiRNAs repressed their own tran- 
scripts at the TAS1 and TAS2 level (Figure 8a), and some 
siRNAs generated from PPR genes could potentially be 
involved in the silencing of PPR-P clade transcripts as also 
reported by Howell et al. (9). Furthermore, some 
siRNAs derived from both TAS1/2 and PPR transcripts 
might also target other transcripts. As listed in Table 2, 
TASla_D4(+) targeted AT3G06940, a transposable 
element, and ATlG62910-tasi4 targeted AT4G 16570, 
Protein Arginine Methyltransferase 7. These results sug- 
gested that some siRNAs generated from the ath-miR173/ 
TAS/PPR cascade might also have other targets, similar to 
the TAS3-siRNAs targeting the ARF family members 
(Table 1). 

As shown in Supplementary Figure S5e and f, 
our results suggested that a pseudogene of PPR 
proteins, AT1G62860, was regulated by TAS2_D12(-), 
TAS2_D9(-), ath-miR161-l and ATlG62910-tasi3. 
Poliseno et al. (44) recently found that transcripts 
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produced from pseudogene PTENP1, named as miRNA 
decoys, regulated the expression level of tumor suppressor 
gene PTEN by absorbing miRNAs that had complemen- 
tary sites on both PTENP1 and PTEN transcripts. 
The case of AT1G62860 demonstrated that the so-called 
miRNA decoys were also applicable to trans-acting 
siRNAs, which made the miR173/TAS/PPR pathway even 
more complicated than previously thought (Figure 8a). 

Besides tasiRNAs, our analyses suggested that several 
additional miRNA families, ath-miR163, ath-miR860, 
ath-miR166 and ath-miR393 of Arabidopsis thcdiana 
self-repressed their own primary or precursor transcripts, 
in addition to the ath-miR172, ath-miR390, ath-miR398 
and ath-miR396 families reported in ref. (14). 

CONCLUSIONS 

The contributions of this study are 3-fold. First, it 
introduced a novel algorithm, called SeqTar, for identify- 
ing sRNA-induced cleavages captured in degradomes. 
Second, SeqTar identified many new sRNA targets 
in Arabidopsis and rice that could be missed when 
using stringent criteria. Finally, the use of P v -value for 
evaluating the abundance of valid reads is a better 
means to identify sRNA guided cleavage sites on 
mRNA targets that have >4 mismatches than the 
existing criteria. The extra penalties to mismatches in the 
2-13 th nt region and disallowing mismatch and G:U 
Wobble pair at positions 10 and 11 used in the existing 
criteria may miss these targets. By simultaneously taking 
into consideration the ,P,„-value of mismatches and 
i\,-value °f valid reads, the false positive rate of SeqTar 
was further reduced than the other methods that only used 
alignment information. Our results suggested the existence 
of more targets with more mismatches and with 
mismatches at position 10 or 11. Our study offered 
novel insights into the principles that sRNAs follow in 
recognizing and degrading their targets in plants. 
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Supplementary Data are available at NAR Online: 
Supplementary Tables 1-17, Supplementary Figures 1-5 
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